# The contents of this file are subject to the Mozilla Public License
# (MPL) Version 1.1 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License
# at http://www.mozilla.org/MPL/
#
# Software distributed under the License is distributed on an "AS IS"
# basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
# the License for the specific language governing rights and
# limitations under the License.
#
# The Original Code is LEPL (http://www.acooke.org/lepl)
# The Initial Developer of the Original Code is Andrew Cooke.
# Portions created by the Initial Developer are Copyright (C) 2009-2010
# Andrew Cooke (andrew@acooke.org). All Rights Reserved.
#
# Alternatively, the contents of this file may be used under the terms
# of the LGPL license (the GNU Lesser General Public License,
# http://www.gnu.org/licenses/lgpl.html), in which case the provisions
# of the LGPL License are applicable instead of those above.
#
# If you wish to allow use of your version of this file only under the
# terms of the LGPL License and not to allow others to use your version
# of this file under the MPL, indicate your decision by deleting the
# provisions above and replace them with the notice and other provisions
# required by the LGPL License.  If you do not delete the provisions
# above, a recipient may use your version of this file under either the
# MPL or the LGPL License.

'''
Stream interfaces for the input, with an implementation using singly linked
lists.

This package defines both interfaces (via ABCs) and implementations.


Interfaces
----------

The simplest interface is `SimpleStream` .  This requires only a subset of 
the standard Python collection interface, and so is compatible with str, 
list, etc.  A `SimpleStream` implementation is all that is expected by the 
basic `parse` and `match` methods (and `null_parser` and `null_matcher`).

`LocationStream` then extends `SimpleStream` to add extra methods
that provide useful information for debugging and error messages (location 
in file etc).  Implementations of `LocationStream` are created by the
type-specific parse/match methods (`string_parser` etc).

Further interfaces exist (`StreamFactory` and `Source`), but they are of less
interest to end-users.

`StreamFactory` is the interface used to generate a stream during matching
(using the methods in `OperatorMatcher` like `parse_string`).  It is taken
from the `Configuration` object, so alternative configurations can supply
different streams.

`Source` is the interface used to wrap input in a standard way so that the
`Line` class (generated by `DefaultStreamFactory`) is as lightweight as 
possible.  See discussion below.


Implementations
---------------

Sometimes LEPL processes data from a single block of memory (a string, for
example).  Other times it may process a stream of data - eg characters
or tokens or lines from a file.

To minimise development and maintenance work it would be preferable to 
unify all these via a single source.  For example, we could always construct
an in-memory array of data and then use that.  This has the advantage of
being simple and using a minimal total amount of resources; the disadvantages
are that all input must be available immediately and that the instantaneous
amount of resources consumed is high.

An alternative approach would be to treat everything as a stream of values
and then add a linked list interface to give the minimal persistence
required for backtracking (see details below).  This is also conceptually
fairly simple, and the instantaneous resource use at any one time is
small; the disadvantages are that the total resource use over time is 
high and some operations - like calculating the total length, needed for
left recursive grammars, are very expensive.

To try get the best of both solutions described above I am going with an
intermediate approach which holds a "line" of data in memory in an array.
In most cases this will be natural text line (eg terminated by a newline), 
but they could also be of arbitrary length.

The fundamental class is then Line, which holds a line of data in a 
linked list with other Line instances.  The client code interacts with this
via `StreamView`, which implements the `LocationStream` interface and stores 
state for a particular character location.  Because there are many `StreamView` 
instances (potentially one per character), and also, possibly, many Lines, 
we take care to make them compact.

One result of keeping things compact is that there's no inheritance tree for
those classes; instead different sources are wrapped in `Source` instances 
which provide a general iterator over "lines".


Hashes
------

Hashing (and equality) of streams is important because it is used by the 
memoisation classes.  Normally a hash depends on (1) the underlying source
and (2) the position of the stream within the source.  This changes slightly
for tokens, where (1) becomes the content of the token.

Things are complicated slightly because the location within the source
depends on the offset in the stream view relative to the line, plus the
offset of the line relative to the source.

We are also constrained by the following design decisions: (i) the stream
must resemble a string or similar; (ii) lines are implemented by a single
class that delegates source-specific functionality to the source.

The simplest solution (although perhaps not the most efficient) seems to
be for the stream view to use hash(line) ^ offset, and for hash(line) to
be delegated to the source (and similarly for equality).
'''

from abc import ABCMeta, abstractmethod, abstractproperty
from io import StringIO, IOBase

from lepl.support.lib import open_stop, sample, format, basestring, str, \
    LogMixin


class StreamException(Exception):
    '''
    Error raised by classes in this module.
    '''

#class _SimpleStream(metaclass=ABCMeta):
# Python 2.6
# pylint: disable-msg=W0105, C0103
_SimpleStream = ABCMeta('_SimpleStream', (object, ), {})
'''ABC used to identify streams.'''


# pylint: disable-msg=E1002,E1001
# (pylint bug?  this chains back to a new style abc)
# pylint: disable-msg=W0232
# defines only an interface
# pylint: disable-msg=R0903
# using __ methods
class SimpleStream(_SimpleStream):
    '''
    The minimal interface that matchers expect to be implemented.
    '''
    
    __slots__ = []

    @abstractmethod
    def __getitem__(self, spec):
        '''
        [n] returns a character (string of length 1)
        [n:] returns a new SimpleStream instance that starts at the offset n
        [n:m] returns a sequence (ie string, list, etc)
        '''
        pass
    
    @abstractmethod
    def __bool__(self):
        '''
        Non-empty?
        '''
        pass
    
    def __nonzero__(self):
        '''
        Called only by 2.6 (when __bool__ not called).
        '''
        return self.__bool__() 
    
    @abstractmethod
    def __len__(self):
        '''
        Amount of remaining data.
        
        This may be expensive if the data are in a file, but is needed for
        left recursion handling.
        '''
        pass
    
    @abstractmethod
    def __repr__(self):
        pass
    
    @abstractmethod
    def __str__(self):
        pass
    
    @abstractmethod
    def __hash__(self):
        pass
    
    @abstractmethod
    def __eq__(self, other):
        pass
    

# pylint: disable-msg=E1101
# pylint unaware of ABCs
SimpleStream.register(basestring)
SimpleStream.register(list)


class LocationStream(SimpleStream):
    '''
    Additional methods available on "real" streams.
    '''
    
    __slots__ = []
    
    @abstractproperty
    def location(self):
        '''
        A tuple containing line number, line offset, character offset,
        the line currently being processed, and a description of the source.
        
        The line number and offsets are -1 if this is past the end of the file.
        '''

    @abstractproperty
    def line_number(self):
        '''
        The line number (one indexed) from the source.
        '''
        return self.location()[0]
        
    @abstractproperty
    def line_offset(self):
        '''
        The position within the current line (zero indexed).
        '''
        return self.location()[1]
        
    @abstractproperty
    def character_offset(self):
        '''
        The character offset (zero indexed) for the entire data.
        '''
        return self.location()[2]
   
    @abstractproperty
    def text(self):
        '''
        The line of text in the underlying line indexed by this stream,
        starting at the offset.  Needed by ``Regexp`` for strings.
        '''
    
    @abstractproperty
    def source(self):
        '''
        This exposes the join and description attributes from the source,
        which are necessary to calculate derived sources (eg tokens).
        '''
    

# this doesn't extend LocationStream, instead we register with it
# (i believe this saves memory space)
class StreamView(object):
    '''
    A view into a Line that implements LocationStream.
    
    This is intended to be as compact as possible.  Methods don't "count"
    because they are not per-instance, and putting the logic here, rather
    than in Line, makes it easy to iterate (rather than recurse) to the
    end of the data when necessary (for length). 
    '''
    
    __slots__ = ('_StreamView__line', '_StreamView__offset', '__weakref__')

    def __init__(self, line, offset=0):
        self.__line = line
        self.__offset = offset
        
    # pylint: disable-msg=E1103
    # slice not inferred
    def __getitem__(self, index):
        '''
        [n] returns a character (string of length 1)
        [n:] returns a new StreamView instance that starts at the offset n
        [n:m] returns a sequence (ie string, list, etc)
        '''
        
        # [n]
        if isinstance(index, int):
            (line, index) = self.__at(index, True)
            return line.line[index]
        
        if index.step is not None:
            raise IndexError('Slice step not supported')
        
        if index.start is None:
            raise IndexError('Slice start must be specified')
        
        # [n:]
        if open_stop(index):
            if index.start == 0 and self.__offset == 0:
                return self
            return StreamView(*self.__at(index.start))
        
        # [n:m]
        length = index.stop - index.start
        if not length:
            return self.__line.source.join([])
        (line, start) = self.__at(index.start, True)
        line_length = len(line.line)
        remainder = length - (line_length - start)
        lines = [line.line[start:min(start+length, line_length)]]
        while line.line and remainder > 0:
            line = line.next
            if line.line:
                line_length = len(line.line)
                lines.append(line.line[0:min(remainder, line_length)])
                remainder -= line_length
        if remainder > 0:
            raise IndexError(format('Missing {0:d} items', remainder))
        else:
            # in the code above we are careful to accumulate exactly what
            # we need; you might think it simpler to accumulate whole lines
            # and then take a slice below, but not all joins preserve length
            # (consider SOL/EOL).
            return line.source.join(lines)
            
    def __at(self, index, strict=False):
        '''
        Return a (line, index) pair, for which the index lies within the
        line.  Otherwise, raise an error.  
        
        If strict is False, (None, 0) is a possible return value.
        '''
        line = self.__line
        index += self.__offset
        while line.line and index >= len(line.line):
            index -= len(line.line)
            line = line.next
        # it's possible for index to be zero and line to be empty!
        if (not line.line) and (strict or index):
            raise IndexError()
        return (line, index)
    
    def __bool__(self):
        # looking at this code later it seems odd that "offset" doesn't
        # appear here.  what i think happens is that when we are at the end
        # of the stream (when this will return false) we have an empty 
        # underlying line.  in other words, remember that this is a 
        # proxy to the entire stream, and so this reflects whether there
        # is any data in the entire stream, not only in the current line. 
        return bool(self.__line.line)
    
    def __nonzero__(self):
        return self.__bool__()
    
    def __len__(self):
        '''
        Calculate the total length (ie "from here on"), if necessary, and 
        store on the class.
        '''
        line = self.__line
        while line.source.total_length is None:
            line = line.next
        return line.source.total_length - (self.__line.previous_length + 
                                           self.__offset)
    
    def __repr__(self):
        return format('{0!r}[{1:d}:]', self.__line, self.__offset)
        
    def __str__(self):
        # pylint: disable-msg=W0702
        # we want this to be robust
        try:
            return str(self.text)
        except:
            return repr(self)
    
    def __hash__(self):
        return hash(self.__line) ^ self.__offset
    
    def __eq__(self, other):
        return type(self) is type(other) and \
            self.__offset == other.__offset and \
            self.__line == other.__line
    
    @property
    def location(self):
        '''
        A tuple containing line number, line offset, character offset,
        the line currently being processed, and a description of the source.
        
        The line number and offsets are -1 if this is past the end of the file.
        '''
        return self.__line.location(self.__offset)
    
    @property
    def line_number(self):
        '''
        The line number (one indexed) from the source.
        '''
        return self.location[0]
        
    @property
    def line_offset(self):
        '''
        The position within the current line (zero indexed).
        '''
        return self.location[1]
        
    @property
    def character_offset(self):
        '''
        The character offset (zero indexed) for the entire data.
        '''
        return self.location[2]
   
    @property
    def text(self):
        '''
        Provide the current line.
        '''
        return self.__line.text(self.__offset)
    
    @property
    def source(self):
        '''
        Expose the underlying source.
        '''
        return self.__line.source


LocationStream.register(StreamView)


#class StreamFactory(metaclass=ABCMeta):
# Python 2.6
# pylint: disable-msg=W0105, C0103
_StreamFactory = ABCMeta('_StreamFactory', (object, ), {})
'''ABC used to identify stream factories.'''


def list_join(lists):
    '''
    Join function for lists (appends lists together).
    '''
    joined = []
    for list_ in lists:
        joined.extend(list_)
    return joined


class StreamFactory(_StreamFactory):
    '''
    Support for Stream Factories.
    '''
    
    @abstractmethod
    def from_file(self, file_):
        '''
        Provide a stream for the contents of the file.
        '''
        
    @abstractmethod
    def from_path(self, path):
        '''
        Provide a stream for the contents of the file at the given path.
        '''
    
    @abstractmethod
    def from_string(self, text):
        '''
        Provide a stream for the contents of the string.
        '''

    @abstractmethod
    def from_lines(self, lines, source=None, join=''.join):
        '''
        Provide a stream for the contents of an iterator over lines.
        '''
    
    @abstractmethod
    def from_items(self, items, source=None, line_length=80):
        '''
        Provide a stream for the contents of an iterator over items
        (ie characters).
        '''
    
    @staticmethod
    def from_null(stream):
        '''
        Return the underlying data with no modification.
        '''
        assert isinstance(stream, SimpleStream), type(stream)
        return stream
    
    
class _Line(object):
    '''
    Provide a common base class for all lines.
    '''


class DefaultStreamFactory(StreamFactory, LogMixin):
    '''
    A source of Line instances, parameterised by the source.
    '''
    
    def __init__(self):
        super(StreamFactory, self).__init__()

    def __call__(self, source_):
        
        class Line(_Line):
            '''
            This class is specific to a single dataset.
            '''
            source = source_
            __slots__ = ['line', 'previous_length', 'location_state', 
                         '_Line__next']
            
            def __init__(self, line=None, previous_length=0, 
                         location_state=None):
                super(Line, self).__init__()
                self.line = line
                self.previous_length = previous_length
                self.location_state = location_state
                self.__next = None
                
            def location(self, offset):
                '''
                Provide location from the source.
                '''
                return self.source.location(offset, self.line, 
                                            self.location_state)
                    
            @property
            def next(self):
                '''
                Iterate over lines.
                '''
                if not self.__next:
                    (line, location_state) = next(self.source)
                    try:
                        previous_length = self.previous_length + len(self.line)
                    except TypeError:
                        previous_length = self.previous_length
                    self.__next = Line(line, previous_length, location_state)
                return self.__next
            
            def text(self, offset):
                '''
                The current line.
                '''
                return self.source.text(offset, self.line)
            
            def __repr__(self):
                return repr(self.line)
            
            def __hash__(self):
                return self.source.hash_line(self)
            
            def __eq__(self, other):
                return isinstance(other, _Line) and \
                    self.source.eq_line(self, other)
    
        return StreamView(Line().next)
    

    def from_path(self, path, source=None, join=''.join, 
                  flags='rt', buffering=1):
        '''
        Open the file with line buffering.
        '''
        if source is None:
            source = path
        return self(LineSource(open(path, flags, buffering=buffering), 
                               source, join))
    
    def from_string(self, text, source=None, join=''.join, base=None):
        '''
        Wrap a string.
        '''
        if source is None:
            source = sample('str: ', repr(text))
        if base is None:
            base = text
        return self(LineSource(StringIO(str(text)), source, join, base=base))
    
    def from_lines(self, lines, source=None, join=''.join):
        '''
        Wrap an iterator over lines (strings, by default, but could be a 
        list of lists for example).
        '''
        if source is None:
            source = sample('lines: ', repr(lines))
        return self(LineSource(lines, source, join))
    
    def from_items(self, items, source=None, join=list_join, 
                   sub_list=False, line_length=80):
        '''
        Wrap an iterator over items (or a list).
        '''
        if source is None:
            source = sample('items: ', repr(items))
        return self(CharacterSource(items, source, join=join, 
                                    line_length=line_length, sub_list=sub_list))
    
    def from_file(self, file_, source=None, join=''.join):
        '''
        Wrap a file.
        '''
        if source is None:
            source = getattr(file_, 'name', None)
        return self.from_lines(file_, source, join)
    
    def auto(self, source, **kargs):
        '''
        Auto-detect type and wrap appropriately.
        '''
        if isinstance(source, basestring):
            return self.from_string(source, **kargs)
        elif isinstance(source, IOBase):
            # this will use file name attribute if available, then treat as
            # a source of lines
            return self.from_file(source, **kargs)
        else:
            return self.from_null(source, **kargs)
            

DEFAULT_STREAM_FACTORY = DefaultStreamFactory()


#class Source(metaclass=ABCMeta):
# Python 2.6
# pylint: disable-msg=W0105, C0103
_Source = ABCMeta('_Source', (object, ), {})
'''ABC used to identify sources.'''


class Source(_Source):
    '''
    Support for sources.
    '''
    
    def __init__(self, description=None, join=''.join, base=None):
        '''
        `description` and `join` should be obvious; base is used as the
        source of identity.
        '''
        self.__description = description
        self.join = join
        self.total_length = None
        self.__base = base
        self.__cached_hash = super(Source, self).__hash__() \
            if base is None else hash(self.__base)
    
    def __iter__(self):
        return self
    
    @abstractmethod
    def __next__(self):
        '''
        Subclasses should return (line, location_state) where line is None
        when the underlying stream has expired.  This should NOT raise
        StopIteration - that is handled by StreamView.
        '''
    
    def next(self):
        '''
        Python 2.6
        '''
        return self.__next__()
    
    @abstractmethod
    def location(self, offset, line, location_state):
        '''
        A tuple containing line number, line offset, character offset,
        the line currently being processed, and a description of the source.
        '''
    
    # pylint: disable-msg=R0201
    # optional interface
    def text(self, _offset, _line):
        '''
        Subclasses should override this to return the current line, 
        if supported.
        '''
        raise StreamException(
            format('This source ({0!s}) does not support lines.',
                   self.__class__.__name__))
    
    def __str__(self):
        '''
        A description of the source.
        '''
        return self.__description
    
    def __hash__(self):
        return self.__cached_hash
    
    def __eq__(self, other):
        return type(other) is type(self) and \
            other is self if self.__base is None \
            else other.__base == self.__base 
    
    def hash_line(self, line):
        raise Exception(format('No hash for {0!r}, {1!r} ({2})',
                               line.line, line.location, 
                               self.__class__.__name__))
        
    def eq_line(self, line, other):
        raise Exception(format('No eq for {0!r}, {1!r} ({2})',
                               line, other, self.__class__.__name__))
 
 
class LineSource(Source):
    '''
    Wrap a source of lines (like a file iterator, or, more abstractly, 
    anything consistent with IOBase), so that it provides both the line and 
    associated state that can be used later, with an offset, to calculate 
    location.
    '''
    
    def __init__(self, lines, description=None, join=''.join, base=None):
        '''
        `lines` must provide an iterator over the lines, description will be 
        provided as part of location, and join is used to combine lines 
        together.
        `base` is the base hash for the entire stream (if not give a unique
        value is used).
        '''
        super(LineSource, self).__init__(
                        repr(lines) if description is None else description,
                        join, base)
        self.__lines = iter(lines)
        self.__line_count = 1 # one-based indexing
        self.__character_count = 0
    
    def __next__(self):
        '''
        Note that this is infinite - it is the StreamView that detects when
        the Line is empty and terminates any processing by the user.
        '''
        try:
            line = next(self.__lines)
            character_count = self.__character_count
            self.__character_count += len(line)
            line_count = self.__line_count
            self.__line_count += 1
            return (line, (character_count, line_count))
        except StopIteration:
            self.total_length = self.__character_count
            return (self.join([]), (-1, -1))
    
    def location(self, offset, line, location_state):
        '''
        A tuple containing line number, line offset, character offset,
        the line currently being processed, and a description of the source.
        '''
        (character_count, line_count) = location_state
        return (line_count, offset, character_count + offset, 
                line, str(self))
        
    def text(self, offset, line):
        '''
        The current line.
        '''
        if line:
            return line[offset:]
        else:
            return self.join([])
        
    def hash_line(self, line):
        '''
        The hash for a line depends on the entire location.
        '''
        (character_count, line_count) = line.location_state
        return (character_count + 100 * line_count) ^ hash(self)
        
    def eq_line(self, line, other):
        '''
        Equality depends on matching location and base.
        '''
        return line.location_state == other.location_state \
            and self == other.source
    
    
class _SubList(list):
    
    def __str__(self):
        return str([x[0] for x in self])
    
    def __repr__(self):
        return repr([x[0] for x in self])
    
    
class CharacterSource(Source):
    '''
    Wrap a sequence of characters (like a string or list) so that it provides
    "lines" in chunks of the given size.  Note that location has no concept
    of line number (lines are only an implementation detail).  This means,
    amongst other things, that Python regexps will not work with this source 
    (since they work on a "per line" basis).
    '''
    
    def __init__(self, characters, description=None, join=list_join, 
                 line_length=80, sub_list=False):
        '''
        line_length is the size on which items are grouped.
        
        sub_list indicates whether each each item is wrapped in a list.
        If true then they can be joined with list_join.
        '''
        super(CharacterSource, self).__init__(
                    repr(characters) if description is None else description,
                    join=join)          
        self.__characters = iter(characters)
        self.__line_length = line_length
        self.__sub_list = sub_list
        self.__character_count = 0
    
    def __next__(self):
        line = _SubList() if self.__sub_list else []
        for _index in range(self.__line_length):
            try:
                value = next(self.__characters)
                line.append([value] if self.__sub_list else value)
            except StopIteration:
                break
        if line:
            character_count = self.__character_count
            self.__character_count += len(line)
            return (line, character_count)
        else:
            self.total_length = self.__character_count
            return (self.join([]), -1)
    
    def location(self, offset, line, location_state):
        '''
        A tuple containing line number, line offset, character offset,
        the line currently being processed, and a description of the source.
        '''
        character_count = location_state
        if line is None:
            return (-1, -1, -1, None, str(self))
        else:
            return (None, offset, character_count + offset, 
                    line, str(self))
        
    def hash_line(self, line):
        return line.location_state ^ hash(self)
        
    def eq_line(self, line, other):
        return line.location_state == other.location_state \
            and line.line == other.line \
            and other.source == self
        